Indexing temporal information for web pages

نویسندگان

  • Peiquan Jin
  • Hong Chen
  • Xujian Zhao
  • Xiaowen Li
  • Lihua Yue
چکیده

Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-filethen-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارزیابی وب گاه دانشگاه علوم پزشکی تهران براساس معیارهای وب سنجی در سال 2008

Background And Aim: Nowadays university websites are very important in information services. There fore university has designed website for categorizing and availability of mass of information . This study accomplish to purpose evaluated of Tehran university of medicine sciences website base on webometrics criteria on 2008 . Materials and Methods: This survey have been used link analysis metho...

متن کامل

Temporal-Textual Retrieval: Time and Keyword Search in Web Documents

As the web ages, many web documents become relevant only to certain time periods, such as web-pages containing news and events or those documenting natural phenomena. Hence, to retrieve the most relevant pages, in addition to providing the relevant keywords, one may desire to identify the relevant time period(s) as well, e.g., “Barack Obama 1980-1985”. Unfortunately, not much work has been done...

متن کامل

Hybrid Index Structures for Temporal-Textual Web Search

Most Web pages contain temporal information. However, most of previous studies only consider the update time of Web pages rather than fully exploit different temporal features in Web. In this paper, we propose a novel approach to fusing different temporal features in Web pages to build an efficient index structure for temporal-textual Web search. Specially, we focus on update time and content t...

متن کامل

Enhancing Personalized Indexing with XML

Studies show that revisiting Web pages constitutes a significant segment of Web navigation and information retrieval. To facilitate more efficient re-visitation of Web pages, Personalized Indexing (PI) is developing as a tool that enables a user to index and categorize the Web pages of interest in his/ her own terms. Furthermore, it allows a user to dynamically re-group their information collec...

متن کامل

A Method for Indexing Web Pages Using Web Bots

Exploring the content of web pages for automatic indexing is of fundamental importance for efficient e-commerce and other applications of the Web. It enables users, including customers and businesses, to locate the best sources for their use. Today’s search engines use one of two approaches to indexing web pages. They either (i) analyze the frequency of the words (after filtering out common or ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. Sci. Inf. Syst.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2011